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Abstract. The self-consistent Hartree-Fock-Bogoliubov problem in large boxes can be solved 
accurately in the coordinate space with the recently developed solvers hfb-AX (2D) and 
MADNESS-HFB (3D). This is essential for the description of superfluid Fermi systems with 
complicated topologies and significant spatial extend, such as fissioning nuclei, weakly-bound 
nuclei, nuclear matter in the neutron star rust, and ultracold Fermi atoms in elongated traps. 
The HFB-AX solver based on B-spline techniques uses a hybrid MPI and OpenMP programming 
model for parallel computation for distributed parallel computation, within a node multi- 
threaded LAPACK and BLAS libraries are used to further enable parallel calculations of large 
eigensystems. The madness-hfb solver uses a novel multi-resolution analysis based adaptive 
pseudo-spectral techniques to enable fully parallel 3D calculations of very large systems. In 
this work we present benchmark results for hfb-AX and madness-hfb on ultracold trapped 
fermions. 



1. Introduction 

The Hartree-Fock-Bogoliubov (HFB) equation of the Density Functional Theory (DFT) is 
suitable for describing superfluid Fermi systems by properly accounting for the self-consistent 
coupling between the particle-hole and particle-particle mean fields. Recently, we solved HFB 
equations in complicated geometries to describe nuclei and ultra-cold polarized Fermi gases pLj. 
The general HFB equation for a polarized system can be written as: 

^ (1) 

where ha and /it are Hartree-Fock Hamiltonians for the two spin components, and are the 
corresponding chemical potentials, and A is the pairing potential. 

In some cases, solving the HFB equation in large boxes in coordinate space is essential. 
Weakly-bound nuclei and the large-amplitude nuclear collective motion such as fission and fusion 
are examples of problems that require large-box calculations [2j. The ultra-cold atoms trapped 
in elongated optical traps also require a description involving large spatial dimensions [1] , as well 





as HFB description of nucleonic pasta phases in the neutron star crusts [3]. Ah these problems 
are both interesting and important but the underlying calculations are challenging. 

Many HFB solvers used in the nuclear physics context are based on the basis expansion 
method employing harmonic oscillator wave functions. The configuration-space method is 
efficient but offers a fairly poor accuracy for cases involving weakly-bound systems and large 
deformations, see discussion in Ref. |2j. On the other hand, solving HFB equations directly in 
coordinate-space can offer very precise results. Unfortunately, because of numerical challenges 
involved, HFB calculations in non-spherical geometries require high computation cost. In this 
context, multi-core processor architectures such as the Jaguar and Kraken Cray XT5 with 
12 cores per node, or Hopper Cray XE6 supercomputer with 24 cores per node, promise to 
revolutionize the deformed HFB problem. To fully take advantage of unique computational 
capabilities, HFB solvers should be adapted and improved in terms of scalability. 

To this end, we developed two parallel HFB codes: 2D hfb-ax for axially symmetric 
systems and madness-hfb for fully 3D systems. Both solvers take advantage of modern multi- 
core architectures. The following sections, overview computational and numerical techniques 
employed by these two codes and present benchmark calculations for ultra-cold Fermi atoms. 
To illustrate the deformed HFB problem in coordinate space, we present an extreme application 
to novel pairing phases in polarized cold Fermi gases in extremely elongated traps 



2. Two-dimensional HFB solver hfb-ax 

In HFB-AX |2j, the wave functions are discretized on a 2-D grid (ra, 
B-splines: 
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where the coefficients C!^ can be obtained by diagonalizing the Hamiltonian matrix. To reduce 
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Figure 1. Left: performance of the complex eigensolver ZGEEV_a using different number of 
threads (ZGEEV_a is a modified version of LAPACK's ZGEEV). Right: performance of the 
complex matrix multiplication routine ZGEMM. The two routines are employed by hfb-ax and 
tested on Jaguar with the xt-libsci/10.5.0 library. 



computational costs, we assume reflection symmetry; hence, the intrinsic parity vr is a good 
quantum number that can be used to enumerate HFB eigenstates. Another quantum number 
preserved by hfb-ax is the angular momentum projection O on the symmetric axis. The 
boundary conditions are implemented through first- and second-order derivative operators. The 



precision of computations depends on the mesh size, order of B-sphnes, and the box size. Those 
parameters determine the numerical effort involved. 

In HFB-AX, the diagonalization of the complex Hamiltonian matrix takes the bulk of the 
computing time. Diagonalization blocks of given il. and vr quantum numbers are assigned to 
computing cores through MPI communications. We modified the LAPACK diagonalization 
routine ZGEEV to calculate only the selected eigenvectors. For calculations of a heavy nucleus 
with the mesh size of 0.6 fm, 10-th order B-splines, and a box size of 24 fm, the rank of 
Hamilton matrix is A^=4608. To take the advantage of multi-core architectures, we use the 
threaded LAPACK library, which can speed up the diagonalization by a factor of 3 with 6 
threads, as shown in Fig. [^a). For the matrix multiplication problem in Fig. [IJb), the routine 
ZGEMM scales perfectly. For other matrix operations, we use OpenMP for parallel calculations. 
The modified Broyden method is used to accelerate the iteration convergence [2]. The hybrid 
MPI and OpenMP programming have made hfb-ax an accurate and fast HFB solver for large 
box calculations. Besides the threaded library on Cray machines, one can also utilize the 
threaded AMD's ACML library or the threaded Intel MKL library. In the near future, the 
multi-core+GPU Cray XT6 machine is expected, and the corresponding MAGMA library [5] 
can further improve the capability of hfb-ax. 

The precision of hfb-ax has been demonstrated in calculations of weakly-bound nuclei in 
which the coupling to scattering continuum is essential [21 16]. In large-box calculations, very 
dense quasi-particle spectrum is obtained, and the non-resonant continuum contribution can be 
precisely taken into account by means of the direct integration [6]. Furthermore, hfb-ax is 
expected to provide precise solutions for the continuum QRPA description of excited states by 
means of the recently developed finite-amplitude method [7|. The recent success of hfb-ax is 
the prediction of rapidly oscillating pairing potential in highly elongated traps [4j by using the 
SLDA [8j and ASLDA [9j energy density functionals for polarized Fermi gases. 

3. 3D MADNESS-HFB Solver 

The software madness-hfb \10\ [TT] is a fast 0{Nloge) method based on the multiresolution 
analysis and low separation rank methods of representing and approximating functions and 
operators for solving Schrodinger and Lippman-Schwinger equations up to user-determined 
precision e. MADNESS is an acronym for Multiresolution Adaptive Numerical Environment for 
Scientific Simulations [T2]. The application of multiresolution analysis separates the behavior of 
functions and operators at different length scales in a functional way. Interesting mathematical 
and computational feature in our MADNESS library implementation is that each of the operators 
and wave functions has its own adaptive structure of refinement to achieve and guarantee the 
desired accuracy. 

The representation of functions and operators in MADNESS is based on an adaptive 
pseudo-spectral representations of functions and operators using the discontinuous Alpert's 
multiwavelets \\.2>\ [T3] and the low-separation rank representations and applications of Green's 
functions. Alpert's multiwavelets have compact support. For representing functions in 3D, 
we use tensor products of ID multiwavelets. The two-scale relations provides for an adaptive 
approximation of the expansion in terms of multiwavelet basis as a function of accuracy in terms 
of the regularity of the solutions. The support of the bases functions in each level resembles 
that of structure adaptive refinement. A schematic figure of such a refinement process is shown 
in Fig. [21 Using two-scale relations, the multiwavelet basis provides a way of constructing an 
adaptive pseudo-spectral method. The multiwavelet basis is a local basis of degree k defined on 
the interval (0, 1) generated from scaling functions which are defined by shifted, rescaled and 
orthonormalized Legendre polynomials from degree to /c — 1 from (—1,1) with a value of 
defined outside of (0,1). The subspace, of L2((0, 1)), spanned by scaling functions of degree 
k is denoted by = (jjkix) = ^Jk + l/2Pi{2x - 1) on (0, 1) where Pi (x) is the z-th Legendre 




Figure 2. A schematic picture illustrating the two-scale generated refinement process of a 
multi-resolution representation of a function in 2D with 3-levels. 

polynomial on (—1, 1). For each level n, we further define an ascending sequence of the subspaces 
= (/)^;(x) = 2"/20j(2"a;-O,J=O,....,A:-l, / = 0, 2" - 1. Specifically, Vj^ C V^, the 
multiwavelets basis, denoted by W^, is the orthogonal complement of Vj} in . For each level 
n, we can further construct WJ} in the same way of shifting and rescaling the basis functions of 
W° as in to construct W^. Thus, 

V^^ C C C ... C Vl 

= + + + ... + w^_^. 

The notable features of the multiwavelet bases are that at each level the support of the basis 
functions are of width 2~", and that there is an exact algebraic relationship between the basis 
functions at level n and level n + 1. In particular, the coefficients of a representation of a smooth 
function in the multiwavelet basis decays proportionally to the width of the interval at level 
n. Thus, we can estimate the accuracy of the representation and truncate coefficients below a 
specified threshold for a sparse representation. 

In 3D, we use a tensor product basis generated from the ID multiwavelet basis; the 
representation of operators uses a non-standard approximation or a low separation rank 
approximation based on Gaussians. The structure of the support of a function resembles that 
of an oct-tree. Each node of the oct-tree consists of a tensor of coefficients. The truncation of 
the sparse coefficients produce a pruned tree. The oct-tree is distributed across the nodes of a 
massively parallel computer referenced by a global hash-table for each node of the tree. The 
singular integral operators are represented using a low-separation rank approximation of the 
Green's functions [T5] with details described in [16]. 

The MADNESS library uses a combined MPI and pthread parallel computing method in a 
task based computing model with a task graph scheduling and queue on each node. On each 
node of a parallel multicore computer, one core is devoted to processing internode communication 



via MPI, and one core is devoted to handling thread scheduling and task allocation and queue 
scheduling within the node. The code is polymorphic with the use of C++ templates. Each 
operation on functions and the application of operators are defined in terms of tasks to be 
performed on the nodes or tensors of the "oct-trees" in the representation the functions or 
operators. In addition, a data path and flow path dependency analysis is performed to determine 
when different steps of the code can be overlapped and scheduled in each node's task queue so 
that distributed multi-threading computation can be performed. Furthermore, since diff^erent 
functions and operations have different data and work-loads, a user directed dynamic load- 
balancing of data memory can be performed to permit more efficient data and work load 
distribution. 

The solution methodology for the madness-hfb consists of two main steps: (i) 
diagonalization of the Hamiltonian matrix using wave functions expanded in the multiwavelet 
bases in 3D, and (ii) computation of each of the wave functions using the associated Lippman- 
Schwinger equation. Recall that the Green's function for the operator —A — A is e~^''/r, 
the Yukawa potential. The solving procedure is similar to that for solving Hartree-Fock 
problems [10]. In short, the solution algorithm is: 

• Obtain a set of guess wave-function uq in the multiwavelet basis. 

• Iterate until convergence: 

— Compute and diagonalize the Hamiltonian matrix T-L to obtain the latest orthogonal 
wave functions; 

— Compute densities, properties and gradients; 

— Compute the approximation of the Yukawa potential for each eigenvalue A; 

— Solve the Lippman-Schwinger equation by convolving with the Green's function based 
on Yukawa potential, u„+i = G-k Vun- 

• Compute observables. 

We carried out benchmark calculations with hfb-ax and madness-hfb for 100 ultra-cold 
fermionic atoms in elongated traps. The cold Fermions at the unitary limit can be described 
by the superfluid density functional SLDA [5]. The single-particle Hamiltonian of SLDA can be 
written as: 

where the deformed external trap potential is: 

cj2(x2 + + /rfy 
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The densities of spin-up [p^) and spin-down atoms, the pairing densities k, and pairing 
gaps A can be expressed in terms of the HFB two-component eigenvectors ([1]): 

i i 

'^W = J2fiUii^Ki^),^i^) = -9effir)K{r), (6) 

i 

where p = p-\ + Pi and geff{r) is the regularized pairing strength. In Eq. fi = 

[1 + exp{Ei / kT)]~^ and the temperature is /cT=0.01. The parameters a, /3 and 7 in Eq.Q 
are taken as 1.14, —0.553 and —1/0.0906, respectively [8j. We work in trap units for which 
fi = m = 0} = 1. The trap aspect ratio in ([5]) denotes the trap elongation. In experiments, 
the adopted optical trap is highly elongated with tj up to 50 [T7]. The highly elongated trap 



is interesting because it provides a connection to quasi one-dimensional systems. In madness- 
HFB, we employed very large boxes: {x,y, z)[-WO, 100] for r/ = 5 and [-160, 160] for t] = 16. In 
HFB-AX, the 2D box sizes are 10.5x35 (r/ = 5) and 9.1x70 {rj = 16). The potential depth Vq is 
12 and 10 for rj = 5 and r] = 16, respectively. 
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Figure 3. Pairing densities k{x = 0,y = 0, z) calculated by madness-hfb and hfb-ax for 100 
polarized atoms in an elongated trap with r/ = 5 (left) and r/ = 16 (right). 

Figure [3] displays the calculated pairing densities k{x = 0,y = 0, z) for ultra-cold Fermi 
gas with polarization of 0.2. The energy cutoff is taken as 6.75 and 5.05 for r] = 5 and 16, 
respectively. It is seen that the two solvers agree very well at ?7=5. At rj = 16, there appears a 
very small difference between HFB solutions; this may be an indication that an even larger box 
is needed in hfb-ax. It is to be noted that at r/ = 16 there appear coexisting solutions which 
are close in energy [181 H] • Figure H] shows the 3D pairing density distribution computed by 
MADNESS-HFB for very elongated system with r] = 16. The characteristic transversal oscillations 
of the pairing field are indicative of the Larkin-Ovchinnikov phase 
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Figure 4. Pairing density K{x,y,z), calculated by madness-hfb for the elongated trap with 
rj = 16. The box scale in view is x[-12, 12], y[-12, 12] and -z[-32, 32]. 



In general, the computational effort of 3D HFB calculations is significantly greater than 
in the 2D case. Thus, the adaptive representation using multiwavelet techniques and parallel 



computing are crucial for the success of madness-hfb. To put things in perspective, madness- 
HFB takes about 5 minutes per iteration on 5000 cores for r] = 5 and rj = 16, while hfb-ax 
takes about 10 minutes per iteration for r/ = 5 on 400 cores and 25 minutes for 77 = 16 on 800 
cores. From this trend, we conclude that the parallel performance of madness-hfb is superior 
to HFB-AX in large-box calculations. 

4. Conclusions 

The coordinate-space HFB solvers hfb-ax and madness-hfb have been developed for large- 
box calculations of superfluid Fermi systems. The 2D solver hfb-ax has been adapted to 
the multi-core supercomputers by using a hybrid MPI and OpenMP programming model. 
The 3D solver madness-hfb is based on multi-resolution multiwavelet techniques and more 
sophisticated hybrid MPI and pthreads based task parallelism programming methodologies. 
A scaling benchmark test for 100 ultracold Fermions in elongated traps has been carried 
out. Developments are underway to efficiently use the next generation of multi-core-|-GPU 
architectures for more demanding nuclear problems. 
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